[None][feat] Add prefix-aware scheduling config flag to support opt-out by SimengLiu-nv · Pull Request #15526 · NVIDIA/TensorRT-LLM

SimengLiu-nv · 2026-06-22T21:27:32Z

Summary by CodeRabbit

New Features
- Added enable_prefix_aware_scheduling configuration option to the scheduler, allowing fine-grained control over prefix-aware scheduling behavior for KV cache optimization (enabled by default).
Documentation
- Updated guides with information on the new scheduler configuration option and its interaction with KV cache block reuse settings.

Description

Test Coverage

C++ focused gtests:
CapacitySchedulerTest.PrefixAwareSchedulingDisabledDoesNotDelayDuplicateRequest
CombinedSchedulerTest.PrefixAwareSchedulingDisabledKeepsReusableTokensZero
SerializeUtilsTest.SchedulerConfig

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

SimengLiu-nv · 2026-06-22T21:29:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-22T21:36:21Z

PR_Github #55084 [ run ] triggered by Bot. Commit: 61321d5 Link to invocation

coderabbitai · 2026-06-22T21:36:56Z

📝 Walkthrough

Walkthrough

Introduces a new enable_prefix_aware_scheduling boolean flag (default true) in SchedulerConfig that controls whether schedulers use KV prefix-reuse estimates for admission and token-budget decisions. The flag propagates through C++ schedulers (MaxUtilizationScheduler, GuaranteedNoEvictScheduler, StaticBatchScheduler), Python schedulers (PyCapacityScheduler, KVCacheV2Scheduler), nanobind bindings, serialization, and the public Python API, with corresponding tests and documentation.

Changes

Prefix-Aware Scheduling Flag

Layer / File(s)	Summary
SchedulerConfig contract and serialization `cpp/include/tensorrt_llm/executor/executor.h`, `cpp/tensorrt_llm/executor/schedulerConfig.cpp`, `cpp/tensorrt_llm/executor/serialization.cpp`, `tensorrt_llm/llmapi/llm_args.py`	`SchedulerConfig` gains the `enablePrefixAwareScheduling` constructor parameter, getter, equality comparison, and serialization round-trip; the Pydantic `SchedulerConfig` adds the field and passes it through `_to_pybind`.
C++ scheduler header declarations `cpp/include/tensorrt_llm/batch_manager/capacityScheduler.h`	Constructor signatures for `MaxUtilizationScheduler`, `GuaranteedNoEvictScheduler`, `StaticBatchScheduler`, and `CapacityScheduler` gain `enablePrefixAwareScheduling = true`; the first two classes gain a `mEnablePrefixAwareScheduling` private member.
C++ scheduler behavior when flag is false `cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`	`skippingIsRelevant` and `analyzePrefixReuse` calls are gated on `mEnablePrefixAwareScheduling`; when disabled, a default `PrefixReuseSummary` is substituted and chunked-context contributed-block tracking is skipped; `CapacityScheduler` threads the flag into all three policy constructors.
C++ executor wiring `cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp`, `cpp/tensorrt_llm/batch_manager/trtEncoderModel.cpp`	Both executor models now read `getEnablePrefixAwareScheduling()` from the executor's scheduler config and pass it into the `CapacityScheduler` constructor.
Nanobind and Python binding extensions `cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp`, `cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`	`CapacityScheduler` nanobind binding gains the `enable_prefix_aware_scheduling` argument; `SchedulerConfig` binding gains the constructor parameter, backward-compatible pickle state (size-3 or size-4 tuple), and a new read-only property.
Python scheduler behavior (PyCapacityScheduler / BindCapacityScheduler / SimpleUnifiedScheduler) `tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`	Adds `_disabled_prefix_summary()` helper; gates `_is_skipping_relevant`, `_prefill_contributed_blocks`, and `_beneficial_to_skip` on the flag; updates `GuaranteedNoEvictPolicy` and `MaxUtilizationPolicy` summary lookup with the disabled fallback; wires the flag through `_util.py` into all three scheduler construction paths.
KVCacheV2Scheduler pre/post-prepare_context budgeting `tensorrt_llm/_torch/pyexecutor/scheduler/scheduler_v2.py`	`KVCacheV2Scheduler` stores the flag and conditionally checks token budget before `prepare_context` (disabled mode) or after (enabled mode) in both `_try_schedule_context_full` and `_try_schedule_context_chunked`; KV-resize always uses post-`prepare_context` length.
C++ unit tests `cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp`, `cpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cpp`, `cpp/tests/unit_tests/executor/serializeUtilsTest.cpp`	New tests assert zero reusable tokens and no request staggering for duplicate requests under both scheduler policies when the flag is false; serialization round-trip coverage added.
Python tests and binding tests `tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py`, `tests/unittest/_torch/executor/test_py_scheduler.py`, `tests/unittest/bindings/test_executor_bindings.py`, `tests/unittest/llmapi/test_llm_args.py`	New tests for `KVCacheV2Scheduler` pre-reuse budget charging, `PyCapacityScheduler` duplicate-request non-delay, and binding pickle/property preservation; boolean assertion style tightened to identity checks across existing `test_llm_args.py` tests.
Design doc, user-facing docs, golden manifest, and telemetry schema `reviews/designs/enable_prefix_aware_scheduling.md`, `docs/source/features/kvcache.md`, `docs/source/developer-guide/telemetry.md`, `tensorrt_llm/usage/llm_args_golden_manifest.json`, `tensorrt_llm/usage/schemas/README.md`	Design document added; `kvcache.md` explains the flag's effect; telemetry tables and schemas updated; golden manifest extended with the new scheduler field plus unrelated KV-cache and sparse-attention fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14398: Directly shares the tensorrt_llm/usage/llm_args_golden_manifest.json and telemetry documentation pipeline that this PR also extends with the new scheduler_config.enable_prefix_aware_scheduling entry.

Suggested labels

SW Architecture

Suggested reviewers

arysef
syuoni
venkywonka
nv-guomingz
chang-l
karljang

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description provides test coverage details but lacks explanation of the issue, solution, and design rationale. The description template sections for 'Description' and 'Solution' are not filled in.	Add a 'Description' section explaining the problem/motivation and a 'Solution' section describing how the prefix-aware scheduling flag works and its impact.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly identifies the main feature: adding a prefix-aware scheduling config flag with opt-out capability.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

tests/unittest/llmapi/test_llm_args.py (1)
478-503: 📐 Maintainability & Code Quality | 🔵 Trivial

Coverage status: sufficient in tests/unittest/llmapi/test_llm_args.py.

test_SchedulerConfig_declaration now validates both default True and explicit False propagation to pybind, which is the key contract for this PR in this file. No follow-up needed here.

As per path instructions, tests/** reviews should state whether coverage is sufficient or needs follow-up.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/llmapi/test_llm_args.py` around lines 478 - 503, No changes
are needed. The test_SchedulerConfig_declaration function already provides
sufficient coverage by validating both the default True value for
enable_prefix_aware_scheduling and the explicit False value propagation to
pybind through the PybindMirror conversion, which fulfills the key contract
requirements for this PR.
Source: Path instructions
tests/unittest/bindings/test_executor_bindings.py (1)
2495-2502: 📐 Maintainability & Code Quality | 🔵 Trivial

Coverage status: sufficient in tests/unittest/bindings/test_executor_bindings.py (after the lint fix above).

The pickle and nested ExecutorConfig assertions adequately verify round-trip preservation of enable_prefix_aware_scheduling; no extra follow-up test is needed for this file.

As per path instructions, tests/** reviews should state whether coverage is sufficient or needs follow-up.

Also applies to: 2644-2645, 2703-2703
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/bindings/test_executor_bindings.py` around lines 2495 - 2502,
This is a review approval confirming that the test_scheduler_config_pickle
function adequately verifies round-trip preservation of the
enable_prefix_aware_scheduling attribute through pickle
serialization/deserialization with the appropriate assertions on the
SchedulerConfig object. No code changes are required as the test coverage is
sufficient and meets the stated requirements.
Source: Path instructions
tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py (1)
175-211: 📐 Maintainability & Code Quality | 🔵 Trivial

Coverage status: sufficient in tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py.

The added cases cover constructor wiring and disabled-prefix-aware budget semantics with explicit assertions; no follow-up coverage is needed in this PR for this file.

As per path instructions, tests/** reviews should state whether coverage is sufficient or needs follow-up.

Also applies to: 1987-2024
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py` around lines
175 - 211, The review comment is confirming that test coverage is sufficient and
properly documented per path instructions for test files. No code fix is
required - the comment is an approval noting that the test cases in
make_scheduler and related tests adequately cover constructor wiring and
disabled-prefix-aware budget semantics with explicit assertions, meeting the
requirement to document coverage status in tests/** files.
Source: Path instructions
tests/unittest/_torch/executor/test_py_scheduler.py (1)
2852-2880: 📐 Maintainability & Code Quality | 🔵 Trivial

Coverage status: sufficient in tests/unittest/_torch/executor/test_py_scheduler.py.

This addition directly validates the disabled path (including “must not call prefix-reuse analysis”) and is adequate for this PR scope.

As per path instructions, tests/** reviews should state whether coverage is sufficient or needs follow-up.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/executor/test_py_scheduler.py` around lines 2852 -
2880, This review comment is a positive confirmation that the test coverage is
sufficient and no fixes are required. The test method
test_prefix_aware_scheduling_disabled_does_not_delay_duplicates adequately
validates the disabled path for prefix-aware scheduling by ensuring that the
analyze_prefix_reuse method is not called when enable_prefix_aware_scheduling is
False. No code changes are needed based on this review comment.
Source: Path instructions
cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp (1)
321-335: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Guard remaining prefix-reuse tree walk on disabled mode.

When prefix-aware scheduling is disabled, this block bypasses first-chunk reuse analysis, but the encoder-init cross-summary path still performs analyzePrefixReuse later even though skip logic is disabled in that mode. Add the same flag guard there to avoid unnecessary radix-tree walks.
♻️ Suggested diff
-                else if (isEncoderInit && crossKvCacheManager && crossKvCacheManager->isEnableBlockReuse()
+                else if (mEnablePrefixAwareScheduling && isEncoderInit && crossKvCacheManager
+                    && crossKvCacheManager->isEnableBlockReuse()
                     && !crossKvCacheManager->getBlockManager().isVariableWindow())
                 {
                     // Encoder admission only needs the cross summary for reuse ordering.
                     auto uniqueTokens = *(req->getEncoderUniqueTokens().value());
                     crossSummary = crossKvCacheManager->analyzePrefixReuse(uniqueTokens, *req);
                 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp` around lines 321 - 335,
The code correctly guards the main kvCacheManager's prefix reuse analysis with
mEnablePrefixAwareScheduling flag, but the subsequent block that handles the
cross-summary path with crossKvCacheManager still performs analyzePrefixReuse
even when scheduling is disabled. Add the same mEnablePrefixAwareScheduling
check along with the necessary block reuse and variable window guards before the
crossKvCacheManager analyzePrefixReuse call to match the pattern used for
kvCacheManager and prevent unnecessary radix-tree walks when prefix-aware
scheduling is disabled.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/tensorrt_llm/executor/schedulerConfig.cpp`:
- Around line 33-37: The SchedulerConfig::operator== method is incomplete
because it does not include mDynamicBatchConfig in its equality comparison.
Currently it only checks mCapacitySchedulerPolicy, mContextChunkingPolicy, and
mEnablePrefixAwareScheduling. Add a comparison of mDynamicBatchConfig to the
return statement using the logical AND operator to ensure that two
SchedulerConfig objects with different dynamic batch configurations will
correctly compare as not equal.

In `@cpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cpp`:
- Around line 1413-1435: Add a precondition assert after the capacityScheduler
call (which produces scheduled1) to verify that req1 is present in scheduled1
before proceeding to the microBatchScheduler checks. This ensures the subsequent
assertion checking that req1InCtx is false is testing the correct behavior (that
microBatchScheduler filtered out req1 due to token budget constraints) rather
than passing vacuously if req1 was already absent from scheduled1. Use
ASSERT_TRUE or EXPECT_TRUE with std::any_of to confirm req1 exists in scheduled1
before the microBatchScheduler call.

In `@tests/unittest/bindings/test_executor_bindings.py`:
- Around line 1409-1410: The assertions on the
config.dynamic_batch_config.enable_batch_size_tuning and
config.dynamic_batch_config.enable_max_num_tokens_tuning properties are using
explicit `== True` comparisons, which violates the E712 linting rule. Fix this
by removing the `== True` part from both assertions and relying on the
truthiness check of the boolean properties directly. This means simplifying each
assertion to just check the boolean property itself without the explicit
comparison operator.

---

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`:
- Around line 321-335: The code correctly guards the main kvCacheManager's
prefix reuse analysis with mEnablePrefixAwareScheduling flag, but the subsequent
block that handles the cross-summary path with crossKvCacheManager still
performs analyzePrefixReuse even when scheduling is disabled. Add the same
mEnablePrefixAwareScheduling check along with the necessary block reuse and
variable window guards before the crossKvCacheManager analyzePrefixReuse call to
match the pattern used for kvCacheManager and prevent unnecessary radix-tree
walks when prefix-aware scheduling is disabled.

In `@tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py`:
- Around line 175-211: The review comment is confirming that test coverage is
sufficient and properly documented per path instructions for test files. No code
fix is required - the comment is an approval noting that the test cases in
make_scheduler and related tests adequately cover constructor wiring and
disabled-prefix-aware budget semantics with explicit assertions, meeting the
requirement to document coverage status in tests/** files.

In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Around line 2852-2880: This review comment is a positive confirmation that the
test coverage is sufficient and no fixes are required. The test method
test_prefix_aware_scheduling_disabled_does_not_delay_duplicates adequately
validates the disabled path for prefix-aware scheduling by ensuring that the
analyze_prefix_reuse method is not called when enable_prefix_aware_scheduling is
False. No code changes are needed based on this review comment.

In `@tests/unittest/bindings/test_executor_bindings.py`:
- Around line 2495-2502: This is a review approval confirming that the
test_scheduler_config_pickle function adequately verifies round-trip
preservation of the enable_prefix_aware_scheduling attribute through pickle
serialization/deserialization with the appropriate assertions on the
SchedulerConfig object. No code changes are required as the test coverage is
sufficient and meets the stated requirements.

In `@tests/unittest/llmapi/test_llm_args.py`:
- Around line 478-503: No changes are needed. The
test_SchedulerConfig_declaration function already provides sufficient coverage
by validating both the default True value for enable_prefix_aware_scheduling and
the explicit False value propagation to pybind through the PybindMirror
conversion, which fulfills the key contract requirements for this PR.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c226c76-7bfb-4363-98cf-e6d650c30a47

📥 Commits

Reviewing files that changed from the base of the PR and between f536026 and 61321d5.

📒 Files selected for processing (25)

cpp/include/tensorrt_llm/batch_manager/capacityScheduler.h
cpp/include/tensorrt_llm/executor/executor.h
cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp
cpp/tensorrt_llm/batch_manager/trtEncoderModel.cpp
cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
cpp/tensorrt_llm/executor/schedulerConfig.cpp
cpp/tensorrt_llm/executor/serialization.cpp
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp
cpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cpp
cpp/tests/unit_tests/executor/serializeUtilsTest.cpp
docs/source/developer-guide/telemetry.md
docs/source/features/kvcache.md
reviews/designs/enable_prefix_aware_scheduling.md
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/scheduler/scheduler.py
tensorrt_llm/_torch/pyexecutor/scheduler/scheduler_v2.py
tensorrt_llm/llmapi/llm_args.py
tensorrt_llm/usage/llm_args_golden_manifest.json
tensorrt_llm/usage/schemas/README.md
tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py
tests/unittest/_torch/executor/test_py_scheduler.py
tests/unittest/bindings/test_executor_bindings.py
tests/unittest/llmapi/test_llm_args.py

tensorrt-cicd · 2026-06-23T04:19:47Z

PR_Github #55084 [ run ] completed with state SUCCESS. Commit: 61321d5
/LLM/main/L0_MergeRequest_PR pipeline #44070 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

SimengLiu-nv · 2026-06-23T20:00:52Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-23T20:06:41Z

PR_Github #55324 [ run ] triggered by Bot. Commit: c34ff84 Link to invocation

tensorrt-cicd · 2026-06-24T02:50:14Z

PR_Github #55324 [ run ] completed with state FAILURE. Commit: c34ff84
/LLM/main/L0_MergeRequest_PR pipeline #44275 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tongyuantongyu

LGTM.

Nit: the name enable_prefix_aware_scheduling feels a bit redundant. prefix_aware is probably enough here.

chang-l

Approval for doc changes.

SimengLiu-nv · 2026-06-29T16:17:09Z

/bot run

tensorrt-cicd · 2026-06-29T16:22:45Z

PR_Github #56407 [ run ] triggered by Bot. Commit: 9ac53b3 Link to invocation

tensorrt-cicd · 2026-06-29T18:12:23Z

PR_Github #56407 [ run ] completed with state SUCCESS. Commit: 9ac53b3
/LLM/main/L0_MergeRequest_PR pipeline #45249 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Simeng Liu <simengl@nvidia.com>

SimengLiu-nv · 2026-06-29T20:08:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-29T20:14:14Z

PR_Github #56451 [ run ] triggered by Bot. Commit: bcd57ce Link to invocation

tensorrt-cicd · 2026-06-30T03:55:51Z

PR_Github #56451 [ run ] completed with state FAILURE. Commit: bcd57ce
/LLM/main/L0_MergeRequest_PR pipeline #45293 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

SimengLiu-nv requested review from a team as code owners June 22, 2026 21:27

SimengLiu-nv requested review from QiJune, hchings, joyang-nv and laikhtewari June 22, 2026 21:27

github-actions Bot assigned SimengLiu-nv Jun 22, 2026

SimengLiu-nv changed the title ~~Add prefix-aware scheduling config flag~~ [None][feat] Add prefix-aware scheduling config flag to support opt-out Jun 22, 2026

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread cpp/tensorrt_llm/executor/schedulerConfig.cpp

Comment thread cpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cpp

Comment thread tests/unittest/bindings/test_executor_bindings.py

SimengLiu-nv force-pushed the prefix-aware-flag branch from 61321d5 to 3b04d8c Compare June 22, 2026 21:49

tburt-nv approved these changes Jun 23, 2026

View reviewed changes

Comment thread tests/unittest/bindings/test_executor_bindings.py Outdated

Funatiq reviewed Jun 24, 2026

View reviewed changes

Comment thread cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp Outdated

Comment thread cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp Outdated

Comment thread cpp/tensorrt_llm/executor/schedulerConfig.cpp Outdated

hchings approved these changes Jun 24, 2026

View reviewed changes

joyang-nv requested a review from tongyuantongyu June 25, 2026 01:39

tongyuantongyu approved these changes Jun 25, 2026

View reviewed changes

joyang-nv approved these changes Jun 25, 2026

View reviewed changes

Funatiq approved these changes Jun 25, 2026

View reviewed changes

SimengLiu-nv force-pushed the prefix-aware-flag branch from e11a9ed to 9ac53b3 Compare June 26, 2026 21:59

chang-l approved these changes Jun 26, 2026

View reviewed changes

SimengLiu-nv added 3 commits June 29, 2026 13:08

[None][feat] Add prefix-aware scheduling config flag to support opt-out

31ab2aa

Signed-off-by: Simeng Liu <simengl@nvidia.com>

Address open-rabbit review comments

02cd4bf

Signed-off-by: Simeng Liu <simengl@nvidia.com>

Address comments from Robin and Tyler.

bcd57ce

Signed-off-by: Simeng Liu <simengl@nvidia.com>

SimengLiu-nv force-pushed the prefix-aware-flag branch from 9ac53b3 to bcd57ce Compare June 29, 2026 20:08

Uh oh!

Conversation

SimengLiu-nv commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

SimengLiu-nv commented Jun 22, 2026

Uh oh!

tensorrt-cicd commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

SimengLiu-nv commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tongyuantongyu left a comment

Choose a reason for hiding this comment

Uh oh!

chang-l left a comment

Choose a reason for hiding this comment

Uh oh!

SimengLiu-nv commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

SimengLiu-nv commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

SimengLiu-nv commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading